Search CORE

34 research outputs found

Integrative Analysis Frameworks for Improved Peptide and Protein Identifications from Tandem Mass Spectrometry Data.

Author: Shanmugam Avinash Kumar
Publication venue
Publication date: 01/01/2015
Field of study

Tandem mass spectrometry (MS/MS) followed by database search is the method of choice for high throughput protein identification in modern proteomic studies. Database searching methods employ spectral matching algorithms and statistical models to identify and quantify proteins in a sample. The major focus of these statistical methods is to assign probability scores to the identifications to distinguish between high confidence, reliable identifications that may be accepted (typically corresponding to a false discovery rate, FDR, of 1% or 5%) and lower confidence, spurious identifications that are rejected. These identification probabilities are determined, in general, considering only evidence from the MS/MS data. However, considering the wealth of external (orthogonal) data available for most biological systems, integrating such orthogonal information into proteomics analysis pipelines can be a promising approach to improve the sensitivity of these analysis pipelines and rescue true positive identifications that were rejected for want of sufficient evidence supporting their presence. In this dissertation, approaches based on naive bayes rescoring, search space restriction, and a hybrid approach that combines both are described for integrating orthogonal information in proteomic analysis pipelines. These methods have been applied for integrating transcript abundance data from RNA-seq and identification frequency data from the Global Proteome Machine database, GPMDB (one of the largest repositories of proteomic experiment results), into analysis pipelines, improving the number of peptide and protein identifications from MS/MS data. Further, estimation of false discovery rates in very large proteomic datasets was also investigated. In very large datasets, usually resulting from integrating data from multiple experiments, some assumptions used in typical target-decoy based FDR estimation in smaller datasets no longer hold true, resulting in artificially inflated error rates. Alternative approaches that would allow accurate FDR estimation in these large scale datasets have been described and benchmarked.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116717/1/avinashs_1.pd

Deep Blue Documents at the University of Michigan

Highlights from the ISCB Student Council Symposium 2013

Author: Anupama Jigisha
Avinash Shanmugam
Cynthia Prudence
Emre Guney
Esmeralda Vicedo
Tomás Di Domenico
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

This report summarizes the scientific content and activities of the annual symposium organized by the Student Council of the International Society for Computational Biology (ISCB), held in conjunction with the Intelligent Systems for Molecular Biology (ISMB) / European Conference on Computational Biology (ECCB) conference in Berlin, Germany, on July 19, 2013

Springer - Publisher Connector

PubMed Central

Effective Leveraging of Targeted Search Spaces for Improving Peptide Identification in Tandem Mass Spectrometry Based Proteomics

Author: Alexey I. Nesvizhskii
Avinash K. Shanmugam
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

Effective Leveraging of Targeted Search Spaces for Improving Peptide Identification in Tandem Mass Spectrometry Based Proteomics

Author: Alexey I. Nesvizhskii (13889)
Avinash K. Shanmugam (1333053)
Publication venue
Publication date
Field of study

In shotgun proteomics, peptides are typically identified using database searching, which involves scoring acquired tandem mass spectra against peptides derived from standard protein sequence databases such as Uniprot, Refseq, or Ensembl. In this strategy, the sensitivity of peptide identification is known to be affected by the size of the search space. Therefore, creating a targeted sequence database containing only peptides likely to be present in the analyzed sample can be a useful technique for improving the sensitivity of peptide identification. In this study, we describe how targeted peptide databases can be created based on the frequency of identification in the global proteome machine database (GPMDB), the largest publicly available repository of peptide and protein identification data. We demonstrate that targeted peptide databases can be easily integrated into existing proteome analysis workflows and describe a computational strategy for minimizing any loss of peptide identifications arising from potential search space incompleteness in the targeted search spaces. We demonstrate the performance of our workflow using several data sets of varying size and sample complexity

FigShare

Ten Simple Rules for Starting a Regional Student Group

Author: Avinash Kumar Shanmugam
Geoff Macintyre
Magali Michaut
Philip E. Bourne
Thomas Abeel
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref

Utility of RNA-seq and GPMDB Protein Observation Frequency for Improving the Sensitivity of Protein Identification by Tandem MS

Author: Alexey I. Nesvizhskii (13889)
Anastasia K. Yocum (1333050)
Avinash K. Shanmugam (1333053)
Publication venue
Publication date
Field of study

Tandem mass spectrometry (MS/MS) followed by database search is the method of choice for protein identification in proteomic studies. Database searching methods employ spectral matching algorithms and statistical models to identify and quantify proteins in a sample. In general, these methods do not utilize any information other than spectral data for protein identification. However, considering the wealth of external data available for many biological systems, analysis methods can incorporate such information to improve the sensitivity of protein identification. In this study, we present a method to utilize Global Proteome Machine Database identification frequencies and RNA-seq transcript abundances to adjust the confidence scores of protein identifications. The method described is particularly useful for samples with low-to-moderate proteome coverage (i.e., <2000–3000 proteins), where we observe up to an 8% improvement in the number of proteins identified at a 1% false discovery rate

CiteSeerX

FigShare

Highlights from the 1st ISCB Latin American Student Council Symposium 2014

Author: Hasenahuer Marcia Anahí
Olguin-Orellana Gabriel J.
Parra Rodrigo Gonzalo
Shanmugam Avinash K .
Simonetti Franco Lucio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/04/2015
Field of study

This report summarizes the scientific content and activities of the first edition of the Latin American Symposium organized by the Student Council of the International Society for Computational Biology (ISCB), held in conjunction with the Third Latin American conference from the International Society for Computational Biology (ISCB-LA 2014) in Belo Horizonte, Brazil, on October 27, 2014.Fil: Parra, Rodrigo Gonzalo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Simonetti, Franco Lucio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Hasenahuer, Marcia Anahí. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Grupo Vinculado al IMBICE - Grupo de Biología Estructural y Biotecnología-Universidad Nacional de Quilmes - GBEyB | Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Grupo Vinculado al IMBICE - Grupo de Biología Estructural y Biotecnología-Universidad Nacional de Quilmes - GBEyB | Universidad Nacional de la Plata. Instituto Multidisciplinario de Biología Celular. Grupo Vinculado al IMBICE - Grupo de Biología Estructural y Biotecnología-Universidad Nacional de Quilmes - GBEyB; ArgentinaFil: Olguin-Orellana, Gabriel J.. Pontificia Universidad Católica de Chile; ChileFil: Shanmugam, Avinash K .. University of Michigan; Estados Unido

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

PubMed Central

Deep Blue Documents at the University of Michigan

Reporting of Mosaics as High-level and Low level mosaics makes more number of embryos available as alternatives for transfer when no euploid embryos are available

Author: Avinash Shanmugam
Darren Griffin
Lia Ribustello
Mike Large
Pere Colls
Santiago Munne
Sarthak Sawarkar
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref